-
Couldn't load subscription status.
- Fork 15k
[SpecialCaseList] Add RadixTree for substring matching #164545
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SpecialCaseList] Add RadixTree for substring matching #164545
Conversation
Created using spr 1.3.7 [skip ci]
Created using spr 1.3.7
|
@llvm/pr-subscribers-llvm-support Author: Vitaly Buka (vitalybuka) ChangesThis commit adds a new RadixTree to Full diff: https://github.com/llvm/llvm-project/pull/164545.diff 2 Files Affected:
diff --git a/llvm/include/llvm/Support/SpecialCaseList.h b/llvm/include/llvm/Support/SpecialCaseList.h
index c077f8857c9c8..f66cd6fe733a7 100644
--- a/llvm/include/llvm/Support/SpecialCaseList.h
+++ b/llvm/include/llvm/Support/SpecialCaseList.h
@@ -170,6 +170,10 @@ class SpecialCaseList {
RadixTree<iterator_range<StringRef::const_iterator>,
SmallVector<const GlobMatcher::Glob *, 1>>>
SuffixPrefixToGlob;
+
+ RadixTree<iterator_range<StringRef::const_iterator>,
+ SmallVector<const GlobMatcher::Glob *, 1>>
+ SubstrToGlob;
};
/// Represents a set of patterns and their line numbers
diff --git a/llvm/lib/Support/SpecialCaseList.cpp b/llvm/lib/Support/SpecialCaseList.cpp
index 15367afd91e72..37fd5bfad750d 100644
--- a/llvm/lib/Support/SpecialCaseList.cpp
+++ b/llvm/lib/Support/SpecialCaseList.cpp
@@ -94,6 +94,19 @@ void SpecialCaseList::GlobMatcher::preprocess(bool BySize) {
StringRef Prefix = G.Pattern.prefix();
StringRef Suffix = G.Pattern.suffix();
+ if (Suffix.empty() && Prefix.empty()) {
+ // If both prefix and suffix are empty put into special tree to search by
+ // substring in a middle.
+ StringRef Substr = G.Pattern.longest_substr();
+ if (!Substr.empty()) {
+ // But only if substring is not empty. Searching this tree is more
+ // expensive.
+ auto &V = SubstrToGlob.emplace(Substr).first->second;
+ V.emplace_back(&G);
+ continue;
+ }
+ }
+
auto &PToGlob = SuffixPrefixToGlob.emplace(reverse(Suffix)).first->second;
auto &V = PToGlob.emplace(Prefix).first->second;
V.emplace_back(&G);
@@ -116,6 +129,21 @@ void SpecialCaseList::GlobMatcher::match(
}
}
}
+
+ if (!SubstrToGlob.empty()) {
+ // As we don't know when substring exactly starts, we will try all
+ // possibilities. In most cases search will fail on first characters.
+ for (StringRef Q = Query; !Q.empty(); Q = Q.drop_front()) {
+ for (const auto &[_, V] : SubstrToGlob.find_prefixes(Q)) {
+ for (const auto *G : reverse(V)) {
+ if (G->Pattern.match(Query)) {
+ Cb(G->Name, G->LineNo);
+ break;
+ }
+ }
+ }
+ }
+ }
}
SpecialCaseList::Matcher::Matcher(bool UseGlobs, bool RemoveDotSlash)
|
Created using spr 1.3.7 [skip ci]
Finds longest (almost) plain substring in the pattern. Implementation is conservative to avoid false positives. The result is not used to optimize `GlobPattern::match()` so it's calculated on request. For * #164545 --------- Co-authored-by: Luke Lau <luke@igalia.com>
…512) Finds longest (almost) plain substring in the pattern. Implementation is conservative to avoid false positives. The result is not used to optimize `GlobPattern::match()` so it's calculated on request. For * llvm/llvm-project#164545 --------- Co-authored-by: Luke Lau <luke@igalia.com>
Created using spr 1.3.7 [skip ci]
Created using spr 1.3.7 [skip ci]
Finds longest (almost) plain substring in the pattern. Implementation is conservative to avoid false positives. The result is not used to optimize `GlobPattern::match()` so it's calculated on request. For * llvm#164545 --------- Co-authored-by: Luke Lau <luke@igalia.com>
|
nit: "Use RadixTree" in commit message? |
Created using spr 1.3.7 [skip ci]
Created using spr 1.3.7 [skip ci]
This commit introduces a RadixTree implementation to LLVM. RadixTree, as a Trie, is very efficient by searching for prefixes. A Radix Tree is more efficient implementation of Trie. The tree will be used to optimize Glob matching in SpecialCaseList: * #164531 * #164543 * #164545 --------- Co-authored-by: Kazu Hirata <kazu@google.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
This commit introduces a RadixTree implementation to LLVM. RadixTree, as a Trie, is very efficient by searching for prefixes. A Radix Tree is more efficient implementation of Trie. The tree will be used to optimize Glob matching in SpecialCaseList: * llvm/llvm-project#164531 * llvm/llvm-project#164543 * llvm/llvm-project#164545 --------- Co-authored-by: Kazu Hirata <kazu@google.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Created using spr 1.3.7 [skip ci]
Created using spr 1.3.7 [skip ci]
Created using spr 1.3.7 [skip ci]
Created using spr 1.3.7 [skip ci]
This commit adds a new RadixTree to `SpecialCaseList` for handling substring matches. Previously, `SpecialCaseList` only supported prefix and suffix matching. With this change, patterns that have neither prefixes nor suffixes can now be efficiently filtered. According to SpecialCaseListBM: Lookup benchmarks (significant improvements): ``` OVERALL_GEOMEAN -0.7809 ``` Lookup `*test*` like benchmarks (huge improvements): ``` OVERALL_GEOMEAN -0.9947 ``` https://gist.github.com/vitalybuka/ee7f681b448eb18974386ab35e2d4d27
|
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/52/builds/12294 Here is the relevant piece of the build log for the reference |
Finds longest (almost) plain substring in the pattern. Implementation is conservative to avoid false positives. The result is not used to optimize `GlobPattern::match()` so it's calculated on request. For * llvm#164545 --------- Co-authored-by: Luke Lau <luke@igalia.com>
This commit introduces a RadixTree implementation to LLVM. RadixTree, as a Trie, is very efficient by searching for prefixes. A Radix Tree is more efficient implementation of Trie. The tree will be used to optimize Glob matching in SpecialCaseList: * llvm#164531 * llvm#164543 * llvm#164545 --------- Co-authored-by: Kazu Hirata <kazu@google.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
This commit adds a new RadixTree to `SpecialCaseList` for handling substring matches. Previously, `SpecialCaseList` only supported prefix and suffix matching. With this change, patterns that have neither prefixes nor suffixes can now be efficiently filtered. According to SpecialCaseListBM: Lookup benchmarks (significant improvements): ``` OVERALL_GEOMEAN -0.7809 ``` Lookup `*test*` like benchmarks (huge improvements): ``` OVERALL_GEOMEAN -0.9947 ``` https://gist.github.com/vitalybuka/ee7f681b448eb18974386ab35e2d4d27
Finds longest (almost) plain substring in the pattern. Implementation is conservative to avoid false positives. The result is not used to optimize `GlobPattern::match()` so it's calculated on request. For * llvm#164545 --------- Co-authored-by: Luke Lau <luke@igalia.com>
This commit introduces a RadixTree implementation to LLVM. RadixTree, as a Trie, is very efficient by searching for prefixes. A Radix Tree is more efficient implementation of Trie. The tree will be used to optimize Glob matching in SpecialCaseList: * llvm#164531 * llvm#164543 * llvm#164545 --------- Co-authored-by: Kazu Hirata <kazu@google.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
This commit adds a new RadixTree to `SpecialCaseList` for handling substring matches. Previously, `SpecialCaseList` only supported prefix and suffix matching. With this change, patterns that have neither prefixes nor suffixes can now be efficiently filtered. According to SpecialCaseListBM: Lookup benchmarks (significant improvements): ``` OVERALL_GEOMEAN -0.7809 ``` Lookup `*test*` like benchmarks (huge improvements): ``` OVERALL_GEOMEAN -0.9947 ``` https://gist.github.com/vitalybuka/ee7f681b448eb18974386ab35e2d4d27
This commit adds a new RadixTree to
SpecialCaseListfor handlingsubstring matches. Previously,
SpecialCaseListonly supported prefixand suffix matching. With this change, patterns that have neither
prefixes nor suffixes can now be efficiently filtered.
According to SpecialCaseListBM:
Lookup benchmarks (significant improvements):
Lookup
*test*like benchmarks (huge improvements):https://gist.github.com/vitalybuka/ee7f681b448eb18974386ab35e2d4d27